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Abstract — The additive rate-distortion function (ARDF) was 
developed in order to universally bound the rate loss in the 
Wyner-Ziv problem, and has since then been instrumental in 
e.g., bounding the rate loss in successive refinements, universal 
quantization, and other multi-terminal source coding settings. 
The ARDF is defined as the minimum mutual information over 
an additive test channel followed by estimation. In the limit of 
high resolution, the ADRF coincides with the true RDF for many 
sources and fidelity criterions. In the other extreme, i.e., the limit 
of low resolutions, the behavior of the ARDF has not previously 
been rigorously addressed. 

In this work, we consider the special case of quadratic 
distortion and where the noise in the test channel is Gaussian 
distributed. We first establish a link to the I-MMSE relation of 
Guo et al. and use this to show that for any source the slope of the 
ARDF near zero rate, converges to the slope of the Gaussian RDF 
near zero rate. We then consider the multiplicative rate loss of the 
ARDF, and show that for bursty sources it may be unbounded, 
contrary to the additive rate loss, which is upper bounded by 1/2 
bit for all sources. We finally show that unconditional incremental 
refinement, i.e., where each refinement is encoded independently 
of the other refinements, is ARDF optimal in the limit of low 
resolution, independently of the source distribution. Our results 
also reveal under which conditions linear estimation is ARDF 
optimal in the low rate regime. 

I. Introduction 

Shannon's rate-distortion function (RDF) for a source X 
and distortion measure d{-, •) is given by 



R{D) = miI{X;Y), 



(1) 



where the infimum is over all reconstructions Y such that the 
expected distortion satisfies 'E[d{X, ¥)] < D. Even though ([TJ 
perhaps appears simple and innocent, it is well-known that it 
is generally very hard to explicitly compute. In fact, there 
exists only very few cases where ([T) is known in closed- 
form, e.g., Gaussian sources and MSE, binary sources and 
Hamming distances etc. In the information theoretic literature, 
several methods have been proposed to approximate the RDF's 
e.g., iterative numeric solutions, high-resolution source coding, 
and (universal) bounds. In the first case, the Arimoto-Blahut 
algorithm is able to numerically obtain the rate-distortion 
function for arbitrary finite input/output alphabet sources and 
single-letter distortion measures [1 1. In the second case, for 
continuous alphabet sources, it was shown by Linder and 
Zamir that the Shannon lower bound (SLB) is asymptotically 



tight for norm-based distortion metrics [2|. Thus, at asymp- 
totically high coding rates, the RDFs can be approximated by 
simple formulaes. In the third case, alternative RDFs, which 
are easier to compute and analyze, are used to bound the true 
RDFs. For example, at general resolution and for difference 
distortion measures, the SLB provides a lower bound to 
the true RDF for many sources. On the other hand, Zamir 
presented in |3| an additive RDF (ARDF), which consists of 
an additive test channel followed by estimation. The ARDF 
has been shown to be a convenient tool for upper bounding 
the rate loss in many source coding problems. In particular, it 
was shown in |3| that the additive rate loss in the Wyner-Ziv 
problem is at most 1/2 bit for all sources. Similarly, it was 
shown by Lastras and Berger in ||4], that the additive rate loss 
in the successive refinement problem is at most 1/2 bit per 
stage. The ARDF has also been successfully applied to upper 
bound the rate loss in other multi-terminal problems, cf. Q, 
|5|. In the limit of high resolution, the ARDF coincides with 
the true RDF for many sources and fidelity criterions [2]. In 
the other extreme, i.e., in the limit of low resolutions, the 
behavior of the ARDF has not been rigorously addressed. 
There has, however, been a great interest in the counter part to 
low resolution source coding, i.e., communication at low SNR, 
e.g., ultra-wideband communication |6|. A motivating factor 
for considering the low SNR regime in communications, is that 
the absolute value of the slope of the capacity-cost function 
is large (and therefore small for the cost-capacity function), 
which indicates that one gets the most channel capacity per 
unit cost at low SNR, as was shown by Verdu Q. Interestingly, 
Verdii also showed that for rate-distortion at low rates, the 
most cost effective operating point in terms of bits per unit 
distortion, is near zero rate [7 1. This follows since the absolute 
value of the slope of the RDF is minimized when the distortion 
approaches its maximum. 

In this paper, we are interested in analyzing the ARDF at 
low resolutions. We consider the special case of the ARDF 
where the test channel's noise is Gaussian and the distortion 
measure is the MSE. We establish a link to the mutual 
information - minimum mean squared estimation (I-MMSE) 
relation of Guo et al. |8| and use this to show that for any 
source the slope of the ARDF near zero rate, converges to the 
slope of the Gaussian RDF near zero rate. We then consider the 



multiplicative rate loss of this ARDF and show that for bursty 
sources it may be unbounded. We also show that unconditional 
incremental refinement, i.e., where each refinement is encoded 
independently of the other refinements, is ARDF optimal 
in the limit of low resolution, independently of the source 
distribution. In particular, let an arbitrarily distributed source 
X be encoded into k representations Yi — + Ni 

where {Ni], i = 1, . . . ,k, are mutually independent, Gaus- 
sian distributed, and independent of X. Then we show that 
I{X; Yi, . . . , Yfc) w J2i HX; Yi) at low rates. Moreover, the 
joint reconstruction follows by simple linear estimation of X 
from {Yi, . . . ,Yk}. If side information Z, where Z is inde- 
pendent of Ni,i = 1, . . . , fc, but arbitrarily jointly distributed 
with X, is available both at the encoder and decoder, we 
show that I{X; Yi, . . . , Yfe|^) ~ E» H^'^ In this case, 

however, the best conditional estimator E,[X\Yi, . . . ,Yk, Z] 
is generally not linear We provide the exact conditions for 
ARDF optimality of linear estimation in the low rate regime. 

II. Background 

In this section, we present two existing important concepts 
that we will be needing in the sequel, i.e., the additive RDF 
and the I-MMSE relation. 

A. The Additive Rate-Distortion Function 

The additive (noise) RDF, as defined by Zamir in f3l, 
describes the best rate-distortion performance achievable for 
any additive noise followed by optimum estimation, includ- 
ing the possibility of time sharing (convexification). In the 
current paper, we restrict attention to Gaussian noise, MMSE 
estimation (MSE distortion), and no time-sharing, so we take 
the "freedom" to use the notation additive RDF, R^'^{D), for 
this special case (i.e. no minimization over free parameters). 
Specifically, let var(X|F) denote the minimum possible MSE 
in estimating X from Y, i.e.. 



var{X\Y) = E[{E[X\Y] - Xf' 



(2) 



Moreover, let the additive noise N be zero-mean Gaussian 
distributed with variance < 6* < oo. Then, 



RT{D)=I{X;X + N), 



(3) 



where the noise variance 9 is chosen such that D = 

vaT{X\X + N). 

B. The I-MMSE Relation 

Using an incremental Gaussian channel, Guo et al. fS) was 
able to establish an explicit connection between information 
theory and estimation theory. For future reference, we include 
this result below: 

Theorem 1 (fSl). Let N be zero-mean Gaussian of unit 
variance, independent of X, and let X have an arbitrary 
distribution Px that satisfies EX^ < oo. Then 



d 

d7 



I{X-^X + N) 



log2(e) 



mmse(7), 



(4) 



where 

mmse(7) = E[{X -¥\X\^X + N]f] = y&i{X\^X + N). 

(5) 

III. Incremental Refinements 

A. The Slope of the ARDF 

We will show that the slope of Rx'^{D) at D = for a 
source X with variance cr\ is independent of the distribution 
of X. In fact, the slope is identical to the slope of the RDF 
of a Gaussian source X' with variance a\, = a\. This is 
interesting since the RDF of any zero-mean source X with 
a variance var(X) = a\ meets the Gaussian RDF at D = 
= a\. Thus, since the Gaussian RDF can be obtained by 
Unear estimation, it follows that R^^'^{D) can also be obtained 
by linear estimation near Dmax- 

Lemma 1. Let Y — y/jX-\-N, where NJLX, X is arbitrarily 
distributed with variance a\ and N is Gaussian distributed 
according to Af{Q, 1). Moreover, let R'^''{D) be the additive 
RDF Then 



lim 



—rT{d) 



log2(e) 
2a\ ■ 



(6) 



irrespective of the distribution on X. 



Remark 1. Interestingly, it was shown by Marco and 
Neuhoff [9^ that in the quadratic memoryless Gaussian case, 
the operational rate-distortion function of the scalar uniform 
quantizer (followed by entropy coding) has the same slope 
as (O. Thus, in this particular case, the optimal scalar 
quantizer is as good as any vector quantizer 

B. Multiplicative Rate Loss in the Low Rate Regime 

Recall that in e.g., the successive refinement problem, the 
additive rate loss is no more than 0.5 bits per stage. We will 
now show that the multiplicative rate loss may be unbounded. 

Let X be a Gaussian mixture source with a density Px {x) 
given by Px{x) = PoA/'(0,cr§) + P^M(f),al), where Fo + 
Pi = 1. The variance a\ of X v& a\ = PqgI + Picrf. The 
components contribution can be parametrized by A e [0; 1] 

A)ct^. It will be 
1 and A = i. Moreover, we shall 
^ ^ 1 > ctq > i. Notice that as cr^ — > oo we 
have that Pi 0,Pq ^ 1, and (Tq \. 

At this point, let 5* = with probability Po and 5 = 1 with 
probability Pi, and let S be an indicator of the two compo- 
nents, i.e., X - 7V(0,CT^), if S* = 0, and X - 7V(0,cr?), if 
S — 1. The RDF, conditional on the indicator 5, is given by 



as follows: Pqctq = Xa'j^^Pia'f — (1 



convenient to let aj^ 
assume that cr^ > 1 > ctq 



5 E^e{o.l} ^og2{af/D), if < D < al 



Rx\s{D) 



Pi, f Piol 



if o-g < £> < L 



Thus, the slope of Rx\s{D) w.rt. D is given by 



Pi 



41n(2)a2.' 



(7) 



which tends to zero as oo and Pi 0. It follows 

from this fact and from Lemma [T] that the ratio of the slope 
of the conditional RDF and the slope of the ARDF grows 
unboundedly as cr^ oo. Moreover, as oo, CTq — > 

which implies that it becomes increasingly easier for the 
uninformed encoder/decoder to guess the correct component 
of the source. Thus, the conditional RDF converges towards 
the true RDF Rx{D), from which it follows that the ratio 



lim 



lim^^,^ R-fiD)/RxiD)^oo 



C. Unconditional Incremental Refinements 

We will now show that unconditional incremental refine- 
ment, i.e., where each refinement is encoded independently 
of the other refinements, is ARDF optimal in the limit of low 
resolution, independently of the source distribution. This result 
is not only of theoretical value but is also useful in practice, 
since conditional source coding is generally more complicated 
than unconditional source coding, i.e., creating descriptions 
that are individually optimal and at the same time jointly 
optimal is a long standing problem in information theory, 
where it is known as the multiple descriptions problem ifTOl . 

Lemma 2. Let X be arbitrarily distributed with variance a\, 
and let Ni^X, i ^ 0, . . . ,k — 1, be a sequence of zero-mean 
mutually independent Gaussian sources each with variance 
a^. Then 

I{X- X + Nq,...,X + Nk-i) - I{X; X + -^No). 

Lemma 3. Let Yi = ^/jX + Ni,i = 0, — 1, 

where NiJLX,\/i. Moreover, let X be arbitrarily distributed 
with variance u\ and let Nq, . . . , N^^i, be zero-mean unit- 
variance i.i.d. Gaussian distributed. Then 

lim -I{X- Yo, . . . ,Yk-i) = k lim -I{X; ^X + TVg) 

7— >0 ^ 7— >0 ^ 

fclog2(e) 2 

= 7, 0-X 



and 



lim — 

7— i-O 



1 



fc. 



(8) 



.var(X|yi,...,rfc_i) a\ 

To illustrate the importance of Lemma |3] let us con- 
sider the situation of a zero-mean unit-variance memoryless 
Gaussian source X, which is to be encoded successively 
in M stages. In stage i, L descriptions Yij,j = 1,...,L, 
are constructed unconditionally of each other Thus, for 
the same coding rate (at each stage), the joint distortion 
vai{X\Yi^i,...,Yi^L,---,Y^^i,...,Yi^L) in the ith stage is 
worse than if only a single joint description within each stage 
had been created. In fact, in the symmetric case where all 
individual descriptions within stage i has the same distortion 
di and rate r^, it can be shown that the joint distortion Di of 
the ith stage is given by 

(9) 



D, = 



L-{L- l)d,/A-i 



Unconditional; IVI=2 
Unconditional: M=10 
Gaussian RDF 




-6 -4 
Distortion [dB] 

Fig. 1 . Unconditional and conditional successive refinements in the quadratic 
Gaussian case. 

and the sum-rate at stage i is given by 



Ri 



(10) 



(7^. Since the Gaussian source is sue- 



where Dq = do 
cessively refinable, using conditional refinements will achieve 
the true RDF given by i?* = ^ log2(l/I?i), where is given 
by (|9]l. On the other hand, the rate required when unconditional 
coding is used is given by ( fTOb . For comparison, we have 
illustrated the performance of unconditional and conditional 
coding when the source is encoded into L = 2 descriptions 
per stage, for the case of of Af = 2 and M — 10 increments 
(stages), respectively, see Fig. [T] In this example, cr|- = 1 and 
Dm = 0.1. Notice that when using smaller increments, i.e., 
when M = 10 as compared to when M — 2, the resulting rate 
loss due to using unconditional coding is significantly reduced. 

D. Unconditional Incremental Refinements ( Side Information ) 

The case of additional side information available at the 
encoder and the decoder was not considered by Guo et 
al. in |8|. Below we generahze Theorem [T] to include side 
information: 

Lemma 4. Let Y = ^/jX + N where N ^ 7V(0, 1) and X 
is arbitrarily distributed, independent of N and of variance 
a\. Let Z be arbitrarily distributed and correlated with X 
but independent of N. Then 



limi/(X;y|Z) = i^i|^var(X|Z). 



(11) 



'a ligorous proof of the convergence is omitted due to space considerations. 



CoroUary 1. Let Yi = ^X + N^,i = 0, ...,fc - 1, 
where NiJLX,yi, and Nq, ■ ■ ■ , Nk-i- Let X be arbitrarily 
distributed with variance and let Nq, . . . , N^-i, be zero- 
mean unit-variance i.i.d. Gaussian distributed. Let Z be arbi- 
trarily distributed and correlated with X but independent of 
Ni , Vi. Then 

lim -I{X; Yo, . . . ,Yk-i\Z) = ^i^fMvar(X|Z). 

7->-0 7 2 



E. Conditions for Optimality of Linear Estimation 

It was recently shown by Akyol et al. |11|, that for an 
arbitrarily distributed source X, contaminated by Gaussian 
noise N, the MMSE estimator of X given Y = ^X + N, 
converges in probability to a linear estimator, in the limit 
where 7 — > 0. Contrary to this result, we show that the 
conditional MMSE estimator E[X|y, Z] with side information 
Z, where Z is independent of N but is arbitrarily correlated 
with X is generally not linear 

Lemma 5. Let Y = ^X~\'N, where N^LX, X is arbitrarily 
distributed with variance a\ and N is Gaussian distributed 
according to A/'(0, 1). Moreover, let Z be arbitrarily dis- 
tributed, independent of N but arbitrarily correlated with X. 
Then the conditional MMSE estimator W^X\Y, Z\ is linear if 
and only if 

E[var(X|Z = zf] = Ya.r{X\Zf , (12) 



where 

E[var(X|Z 
and 



zf] ^ Ez[(Ex[(Ex[X|Z = z]- Xf]f^ 



var(X|Z)2 A (Ex[(Ex[X|Z] - Xf]f. 

In the case where X, Z are jointly Gaussian, it is easy to 
show that ( fT2] l is satisfied and, thus, the MMSE estimator 
E[X|y, Z] is ti-ivially linear- in both Z and N. 
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Appendix 

Proof of Lemma [7} The additive RDF is defined para- 
meti-ically as Rf^{D), by Rf^{-i) = I{j),D{-/) = mmse(7), 
which implies that 

RtiDh))^IiD{^)). (13) 
From the derivative of a composite function, it follows that 
d 



(14) 



We know that 1(7) = I{X; ^X + N) can be expanded as IS) 

7(7) = log2(e) 



^7^1 - ^7^4 + ^7''Tx 



1 

48 



{EX^f - 6EX^ - 2 (EX 



3\2 



15 



7^1+0(7') 



and that 

mmse(7) = a' 
It follows from ([T5]l that 



7a^+7VY + g7^'^x+0(7*)- 



lim 7^/(7) 



log2(e) 



'X- 



(15) 



(16) 



(17) 



From Js), lim^- 



2 



7 — > implies D 
with respect to D at D = D 



7(7) — ~<7x- Moreover, since 
mmse(7) = log2(e)CT^/2 and since 
,x, we have that the slope of R'^^(D) 



IS 



lim 



log2(e) 
2ai • 



(18) 



Proof of Lemma Let Yi ^ X + N^, i = 0, k 
l^Y =[Yo,..., Yfc-i]'^, and let Z be the DFT of Y, i.e.. 



4e« 



exp(27rij/fc), j = 0,...,fc-l. 



(19) 



1=0 



The DC term is given hy Zq = X + ^ J2iZo ^i- The other 
terms, i.e., Zj,j > 0, are AC terms and do not contain X 
(since X is DC). The AC terms are orthogonal to the DC 
component of the noise, i.e., (Zo — X) _L Zj,j > 0, and 
since the Gaussianity of the noise implies independence, we 
are left with only the DC term. Since the A^,'s are mutually 
independent, the resulting sum-noise component X]i=o^ 
of the DC term has variance aj^/k. Thus, the DC term is 
equivalent to X + ^^N, where is distributed as Ni. This 
shows that I{X;Y) = I{X;Zo) = I{X-X + -^No). The 
lemma is proved. ■ 
Proof of Lemma \3\ From Lemma |2] it is clear that 

I{X- Fo, . . . , Yk-i) = I{X- X + -^No). (20) 

To get to the standard form with unit-variance noise, we may 
scale both X and ^/^X + --i=iVo by VX without affecting their 
mutual information, i.e.. 



IiX;X + -^No)^ 
At this point we use that ^ 



IiVkX;VkX + Nq). (21) 



N) 



log2 (e)cr|, 



(22) 



lim -I{X'; ^X' 

'y-^Q 7 z, 

where X' = y/kX and cr^, = kaj^. This proves the first part 
of the lemma. By using well-known linear estimation theory, 
it is easy to show that 

1 1 7 



lmmse(X|Yi, . . . , Yk-i) var(X) var(^A^o) 



= ^ +jk, 



(23) 



where lmmse(X|Yi, . . . , Yfe_i) denotes the MSE due to esti- 
mating X from Fi, . . . , Yfc_i using linear estimation. We now 
invoke the fact that linear estimation is optimal in the limit 
7 — > and re-order the terms in (l23l to get dS). ■ 
Proof of Lemma ^ We will extend the proof technique 
used in fS', Lemma 1] to allow for arbitrary conditional 
distributions. To do this, we make use of the fact Y — X — Z 
forms a Markov chain (in that order), which will allow us to 
simplify the decomposition of their joint distribution. 



Let Eh denote expectation with respect to S. We first expand 
the conditional mutual information in terms of the Divergence, 
i.e. 

I{X-Y\Z)=¥.zD{Pxy\z\\Px\zPy\z) 
= ^z,{x\z}D{Py\z,x\\Py\z) 



^ ^z,{x\z} [D{Py\x\\Py'\z' 
-D{Pyiz\\Py'\z')], 



(24) 



where Py'\z' can be chosen arbitrary as long as 
D{Py\x\\Py'\z') and D{Py\z\\Py'\z') are both well- 
defined. Let Y'\Z' ^ Af{^E[X\Z],l + "fYHT{X\Z)). 

The first term in (|24] | is the Divergence between two 
Gaussian distributions, since E[Y\Z,X] = E[r|X] = iV is 
Gaussian distributed and E[F'|Z'] is Gaussian since a linear 
combination of Gaussians remain Gaussian. In this case we 
have L8| 



7->0 'y 



lim — loff(l 

7->0 7 

= va.r{X\Z), 



-7var(X|Z)) 
(25) 



where we used that lim.^_s.o ^ log(l + 7c) = c. 

We now look at the second expression in ( |24] | and use 
the Markov condition to get to Py\z = ^x\z[Py\x,z] — 
Ex\z[Py\x]- With this, we may adapt the proof technique 
of [8] to obtain: 

PY\z{y\z) 



log 



p. 



Y'\z' iy\z) 

1 E 



log 



T'^X\Z=z 



exp I 



1 



. exp I 



1 

25^ 



log E 



'■^X\Z=z 



exp 



2a% 



[y ~ ^Xf 



(y-7E[X|z])2) 

~m\z]f 



N 



= log Ex|z 



exp 



{y-^nx\z]f 
2(H-7var(X|z)) 



{y Vixf 



2a% 



1 



log>^. 



N 



(a) 



log E 



'■-X\Z=z 



2 

l + ^y{X^nX\z]) 
E[X|z])2-y2var(X|z) 



X2+E[X|^]2 + o(7) 



1 



log(l +7var(X|z)) 



= log(l - I var(X|z)) + i log(l + 7 var(X|z)) + 0(7) 
= 0(7), 

where (a) follows by using a series expansion of exp(-) in 
terms of 7. We have thus established that the second term 
of ( l24b goes to zero (as a function of 7) faster than the first 
term. Thus, the first term dominates the conditional mutual 
information for small 7. This completes the proof. ■ 



Proof of Lemma |5} We first consider the unconditional 
case, where Z = %. Let us assume that EX — fix 7^ 0. Recall 
that Y = y/jX + N, where EA^ = and cr^ = 1. For small 
7, the optimal estimator is linear, and we have that 



.[X\Y]^fix+a{Y-fix), 



(26) 



where a is the Wiener coefficient given by a = E[Xy] 
y/yfJx- From (fTSI l. we know that the MMSE behaves as: 



var(X|r) 



'X 



4 

If^x- 



(27) 



On the other hand, in the conditional case with side informa- 
tion Y, for each Z = z the source has mean E[X|Z = z] and 
variance var(X|Z — z). Using this in ( |26] l. and fixing Z = z, 
leads to 

E[X|y,Z = z] w E[X\Z = z] + a,XY -E[X\Z = z]), (28) 

where the Wiener coefficient depends on z, i.e., = 
^\ai{X\z). Using ( fTSI l for a fixed Z = z yields 

var(X|y,Z ^ z) « var(X|Z = z)-7var(X|Z = zf. (29) 



Taking the average over Z results in 

var(X|y, Z) « var(X|Z) - 7Ez[var(X|Z = zf 



(30) 



where var(X|Z) = E2[var(X|Z = z)]. By Jensen's inequal- 
ity, it follows that 



Ez[var(X|Z = zf] > vai{X\Zf, 



(31) 



with equality if and only if the conditional variance 
vai{X\Z — z) is independent of the realization of z. Thus, 
comparing ( l30l l to ( |27] | shows that the linear estimator is 
generally not optimal. 
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